Title: 22438, 23553, 25278, and 26212, 
Novel Human Sulfatases 
1nventor(s): Glucksmann et al. 
Application No: 09/495,823 
Atty Dkt No: 5800.79(35800/1 91 890) 

1/25 

FIG. 1A. 

Sequence length 2175 

CACGCGTCCGCAAATTTCCTGATTCTTTTGAATTA6GATTCCAGATGGGGGCCTCATTTCTACA6CCCCCAACATTCCT 

ATAGCCGTTATCACTGCCATCACCACTGCCACCAGCATCTTCTTGCAGATTCCACCCCTGCTCCCCAGAGACTTCCTGC/' 

TTTGAAAGTGAGCAGAAAGGAAGCTCTCA6AAAAATCTCTAGTG6TGGCTGCCGTCGCTCCAGACAATCGGAATCCTGC 

MG VLFLKVLLAGVS FSG 17 

CTTCACCACC ATG GGC TGG CTT TTT CTA AAG GTT TTG TTG GCG GGA GTG AGT TTC TCA GGA 51 

FLYPLVD FCISG KTR GQ KPN 37 

TTT CTT TAT CCT CTT GTG GAT TTT TGC ATC AGT GGG AAA ACA AGA GGA CA6 AAG CCA AAC 111 

FVIILADDMGVGD^LGAN VAE 57 

TTT GTG AH ATT TTG GCC GAT GAC ATG GGG TGG GGT GAC CTG GGA GCA AAC TGG GCA GAA 171 

TK DTANLDKMASEGMRFVDF 77 

ACA AAG GAC ACT GCC AAC CTT GAT AAG ATG GCT TCG GAG GGA ATG AG6 HT GTG CAT TTC 231 

HAAASTCSPSRASLLTGRLG 97 

CAT GCA GCT GCC TCC ACC TGC TCA CCC TCC CGG GCT TCC TTG CTC ACC GGC CGG CTT GGC 291 

LRNGVTRNFAVTSVGGLPLN 117 

CTT CGC AAT GGA GTC ACA CGC AAC TTT GCA GTC ACT TCT GTG GGA GGC CTT CCG CTC AAC 351 

ETTLAEVLQQAGYVTGIIGK 137 

GAG ACC ACC TTG GCA GAG GTG CTG CAG CAG GCG GGT TAC GTC ACT GGG ATA ATA GGC AAA 411 

UHLGHHGSYHPNFRGFDYYF 157 

TGG CAT CTT GGA CAC CAC GGC TCT TAT CAC CCC AAC TTC CGT GGT TTT GAT TAC TAC TTT 471 

GIPYSHDMGCTDTPGYNHPP 177 

GGA ATC CCA TAT AGC CAT GAT ATG GGC TGT ACT GAT ACT CCA GGC TAC AAC CAC CCT CCT 531 

CPACPQGDGPS RNLQRBCYT 197 

TGT CCA GCG TGT CCA CAG GGT GAT GGA CCA TCA AGG AAC CTT CAA AGA GAC TGT TAC ACT 591 

DVALPLYENLNIVEQPVNLS 217 

GAC GTG GCC CTC CCT CTT TAT GAA AAC CTC AAC ATT GTG GAG CAG CCG GTG AAC TTG AGC 651 

SLAQKYAEKATQFIQRASTS 237 

AGC CTT GCC CAG AAG TAT GCT GAG AAA GCA ACC CAG TTC ATC CAG CGT GCA AGC ACC AGC 711 

GRPFLLYVALAHMHVPLPVT 257 

GGG AGG CCC TTC CTG CTC TAT GTG GCT CTG GCC CAC ATG CAC GTG CCC TTA CCC GTG ACT 771 

QLPAAPRGRSLYGAGLWEMD 277 

CAG CTA CCA GCA GCG CCA CGG GGC AGA AGC CTG TAT GGT GCA GGG CTC TGG GAG ATG GAC 831 

SLVG Q IKDKVDHTVK ENTFL-297 

AGT CTG GTG GGC CAG ATC AAG GAC AAA GTT GAC CAC ACA GTG AAG GAA AAC ACA TTC CTC 891 

WFTGDNGPVAQKCELAGSVG 317 

TGG TTT ACA GGA GAC AAT GGC CCG TGG GCT CAG AAG TGT GAG CTA GCG GGC AGT GTG GGT 951 

PF-TGFWQTRQGGSPAKQTTW 337 

CCC TTC ACT GGA TTT TGG CAA ACT CGT CAA GGG GGA AGT CCA GCC AAG CAG ACG ACC TGG 1011 

EGGH RVPALAYWPGRVPVNV 357 
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GAA GGA GGC CAC CG6 GTC CCA GCA CTC CCA GCA CTG GCT TAC TGG CCT GGC AGA GTT CCA GTT AAT GTC 1071 
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CAGACCAATTTTTATTCCACGAGGAGGAGTACCTGGAAAHAGGCAAGTTTGCnCCAAATTTCATTTnACCCTCTTT 
ACAAACACACGCnTAGTUAGTCTTGGAGnTAGTnTGGAGTTAGCCTTGCATATCCCTTCTGTATCCTGTCCCTCC 
TCCACGCCGACCCGAGAGCAGCTGAGCTGCGCTGGCTCTGGGCACCCAGTGTGCCTTAATGGGAAGCACACGGGCTHG 
GAGTCAGGCACAGGTGCCAGCTCCAGCTTTTGAACTTGGGCAATTGTTTAACCTAACCTGCAAGTTGATHTGAGGGTT 
AAATAAAGGCATACATGAAAAAAAAAAAAAAAAA 

FIG. 1B. 
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FIG. 4. 



Prosite Pattern Matches 

Prosite version: Release 12.2 of Febaiary 1995 

>PS00001/ PDOC00001/ASN_GLYCOSYLATION N-^lycosyiation site. 

Query: 117 NETT 120 
Query: 215 NLSS 218 
Query: 356 NVTS 359 
Query: 497 NISS 500 

>PS000()5/ PDOC00005/PKC_PHOSPHO_SITE Protein kinase C phosphorylation site. 

Query: 28 SGK 30 
Query: 93 TGR 95 
Query: 237 SGR 239 v 
Query. 290 TVK 292 
Query: 422 TVR 424 

>PS00006/ PDOC00006/CK2_PHOSPHO_SITE Casein kinase II phosphorylation site. 

Query: 120 TLAE 123 

Query: 290 TVKE 293 

Query: 335 TTWE 338 

Query: 364 SVLD 367 

Query: 444 TGPE 447 

Query: 499 SSAD 502 
>PS00008/ PDOC00008/MYRISTYL N-myristoylation site. 

Query: 12 GVSFSG 17 

Query: 33 GQKPNF 38 

Query: 52 GANWAE 57 

Query: 97 GLRNGV 102 

Query: 113 GLPLNE 118 

Query: 158 GIPYSH 163 

Query: 328 GGSPAK 333 

Query: 388 GVDVSE 393 

Query: 418 GALQ7V 423 

Query: 435 GGARAC 440 
> PS00009/ PDOC00009/AMIDATION Amidadon site. 

Query: 382 QGRR 385 



> PS00149/ PDOC00117/SULFATASE_2 Sulfatases signature 2. 
Query: 129 GYVTGIIGKW 138 



m 
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FIG. 5A. 



Input f i le Fbh23553f I. seqj Output Fi le 23553. trans 
Sequence length 4321 

CCCACGCGTCCGGCTAATGAATCTT6GGGCCGGTGTCGGGCCGGGGCGGCTTGATCGGCAACTAGGAAACCCCAGGCGC 

AGAGGCCAGGAGCGAGGGCAGCGAGGATCAGAGGCCAGGCCTTCCCGGCTGCCGGCGCTCCTCGGAGGTCAGGGCAGAt' 

GAGGAACATGACTCTCCCCCTTCGGAGGAGGAAGGAAGTCCCGCTGCCACCTTATCTCTGCTCCTCTGCCTCCTCCCTG 

TTCCCAGAGCTTTTTCTCTAGAGAAGATTTTGAAGGCGGCTTTTGTGCTGACGGCCACCCACCATCATCTAAAGAAGAT 

AAACTTGGCAAATGACATGCAGGTTCTTCAAGGCAGAATAATTGCAGAAAATCTTCAAAGGACCCTATCTGCAGATGTT 

CTGAATACCTCTGAGAATAGAGATTGATTATTCAACCAGGATACCTAATTCAAGAACTCCAGAAATCAGGAGAC6GAGA 
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FIG. 5B. 

TCT GTG GAG AGG CTG TAT AAC ATG CTC GIG GAG ACG GGG GAG CTG GAG AAT ACT TAC ATC 933 

I Y T A D H G Y H I G Q F G L V K G K S' 331 

ATT TAC ACC GCC GAC CAT GGT TAC CAT ATT GGG CAG TTT GCA CTG GTC AAG GGG AAA TCC 993 

M P Y D F D I R V P F F I R G P S V E P 351 

ATG CCA TAT GAC TTT GAT ATT CGT GTG CCT TTT TTT ATT CGT GGT CCA AGT GTA GAA CCA 1053 

GSIVPQIVLNIDLAPTI L D I 371 

GGA TCA ATA GTC CCA CAG ATC GTT CTC AAC ATT GAC TTG GCC CCC ACG ATC CTG GAT ATT 1113 

A G LDTPPDYDGKSVLKLLDP 391 

GCT GGG CTC GAC ACA CCT CCT GAT GTG GAC GGC AAG TCT GTC CTC AAA CTT CTG GAC CCA 1173 

EKPGNRFRTNKKAKIWRDTF 411 

GAA AAG CCA GGT AAC AGG TTT CGA ACA AAC AAG AAG GCC AAA ATT TGG CGT GAT ACA TTC 1233 

LVER GKF LRKKEESSKNI QQ 431 

CTA GTG GAA AGA GGC AAA TTT CTA CGT AAG AAG GAA GAA TCC AGC AAG AAT ATC CAA CAG 1293 

SNHLPKYERVKELCQQARYQ 451 

TCA AAT CAC TTG CCC AAA TAT GAA CGG GTC AAA GAA CTA TGC CAG CAG GCC AGG TAC CAG 1353 

TACEQPGQKWQCIEDTSGKL 471 

ACA GCC TGT GAA CAA CCG GGG CAG AAG TGG CAA TGC ATT GAG GAT ACA TCT GGC AAG CTT 1413 

RIHKCKGPSDLLTVRQSTRN 491 

CGA ATT CAC AAG TGT AAA GGA CCC AGT GAC CTG CTC ACA GTC CGG CAG AGC ACG CGG AAC 1473 

LYARGFHDKD KECSCRESGY 511 

CTC TAC GCT CGC GGC TTC CAT GAC AAA GAC AAA GAG TGC AGT TGT AGG GAG TCT GGT TAC 1533 

RASRSQRKSQRQFLR NQGTP 531 

CGT GCC - AGC AGA AGC CAA AGA AAG AGT CAA CGG CAA TTC HG AGA AAC CAG GGG ACT CCA 1593 

KYKPRFVH TRQ TRSLSVEFE 551 

AAG TAC AAG CCC AGA TTT GTC CAT ACT CGG CAG ACA CGT TCC TTG TCC GTC GAA TTT GAA 1653 

GEIYDINLEEEEELQVLQPR 571 

GGT GAA ATA TAT GAC ATA AAT CTG GAA GAA GAA GAA GAA TTG CAA GTG TTG CAA CCA AGA 1713 

NIAKRHDEGHKGPRDLQASS 591 

AAC ATT GCT AAG CGT CAT GAT GAA GGC CAC AAG GGG CCA AGA GAT CTC CAG GCT TCC AGT 1773 

GGNRGRMLADSSNAVGPPTT 611 

GGT GGC AAC AGG GGC AGG ATG CTG GCA GAT AGC AGC AAC GCC GTG GGC CCA CCT ACC ACT 1833 

VRVTHKCFILPNDSIHCERE 631 

GTC CGA GTG ACA CAC AAG TGT TTT ATT CTT CCC AAT GAC TCT ATC CAT TGT GAG AGA GAA 1893 

L Y Q S A R A V K D H K A Y I D K E I E 651 

CTG TAC CAA TCG GCC AGA GCG TGG AAG GAC CAT AAG GCA TAC ATT GAC AAA GAG ATT GAA 1953 

A L Q D KIKNLREVRGHLK R R K 671 

GCT CTG CAA GAT AAA ATT AAG AAT TTA AGA GAA GTG AGA GGA CAT CTG AAG AGA AGG AAG 2013 

PEECSCSKQSYYNKEKGVKK 291 

CCT GAG GAA TGT AGC TGC AGT AAA CAA AGC TAT TAC AAT AAA GAG AAA GGT GTA AAA AAG 2073 

Q E' K L K S H L H P F K E A A Q E V D S 711 

CAA GAG AAA TTA AAG AGC CAT CTT CAC CCA TTC AAG GAG GCT GCT CAG GAA GTA GAT AGC 2133 
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FIG. 5C. 

KLQLFKENNRRRKKERKEK R 731 
AAA CTG CAA CTT TTC AAG GAG AAC AAC CGT AGG AGG AAG AAG GAG AGG AAG GAG AAG AGA 2193 

R Q R K G E E C S L P G L T C F T H D N ''^ 751 
CGG CAG AGG AAG GGG GAA GAG TGC AGC CTG CCT 6GC CTC ACT TGC TTC ACG CAT GAC AAC 2253 

N H W Q T A P F V N L G S F C A C T S S 771 
AAC CAC TGG CAG ACA GCC CCG TTC TG6 AAC CTG GGA TCT TTC TGT 6CT TGC ACG AGT TCT 2313 

NNNTYVCLRT V NETHNFLFC 791 
AAC AAT AAC ACC TAC TGG TGT TTG CGT ACA GTT AAT GAG ACG CAT AAT TTT CTT TTC TGT 2373 

EFATGFLEYFD M N T D P Y Q L T 811 
GAG TTT GCT ACT GGC TTT TTG GAG TAT TH GAT ATG AAT ACA GAT CCT TAT CAG CTC ACA 2433 

N T V H t V E R G I LNOLH VQL-ME 831 
AAT ACA GTG CAC ACG GTA GAA CGA GGC ATT TTG AAT CAG CTA CAC GTA CAA CTA ATG GAG 2493 

L R S C Q GYKQCNPR P K N L D V G 851 
CTC AGA AGC TGT CAA GGA TAT AAG CAG TGC AAC CCA AGA CCT AAG AAT CTT GAT GTT GGA 2553 

N K D G G S Y DLHRGOLWDGVEG 871 
AAT AAA GAT GGA GGA AGC TAT GAC CTA CAC AGA GGA CAG TTA TGG GAT GGA TGG GAA GGT 2613 

X 872 
TAA 2616 

TCA6CCCCGTCTCACTGCAGACATCAACTGGCAAGGCCTAGAGGAGCTACACAGTGTGAATGAAAACATCTATGAGTAC 

AGACAAAACTACAGACTTAGTCTGGTGGACTGGACTAATTACnGAAGGATTTAGATAGAGTAHTGCACTGCTGAAGA 

GTCACTATGAGCAAAATAAAACAAATAAGACTCAAACTGCTCAAAGTGACGGGHCTTGGTTGTCTCTGCTGAGCACGC 

TGTGTCAATGGAGATGGCCTCTGCTGACTCAGATGAAGACCCAAGGCATAAGGTTGGGAAAACACCTCATTTGACCTTG 

CCAGCTGACCTTCAAACCCTGCATTTGAACCGACCAACAHAAGTCCAGAGAGTAAACnGAATCGAATAACGACAnC 

CAGAAGTTAATCATTTGAATTCTGAACACTGGAGAAAAACCGAAAAATGGACGGGGCATGAAGAGACTAATCATCTGGA 

AACCGATTTCAGTGGCGATGGCATGACAGAGCTAGAGCTCGGGCCCAGCCCCAGGCTGCAGCCCATTCGCA66CACCCG 

AAAGAACTTCCCCAGTATGGTGGTCCTGGAAAGGACATTTTTGAAGATCAACTATATCTTCCTGTGCATTCCGATGGAA 

TTTCAGTTCATCAGATGTTCACCATGGCCACCGCAGAACACCGAAGTAATTCCAGCATAGCGGGGAAGATGTTGACCAA 

GGTGGAGAAGAATCACGAAAAGGAGAAGTCACAGCACCTAGAAGGCAGCGCCTCCTCTTCACTCTCCTCTGATTAGATG 

AAACTGTTACCCTTACCTAAACACAGTATTTCTTTTTAACTTTTTTATTTGTAAACTAATAAAGGKAATCACAGCCACC 

AACATTCCAAGCTACCCTGGGTACCTTTGTGCAGTAGAAGCTAGTGAGCATGTGAGCAAGCGGTGTGCACACGGAGACT 

CATCGTTATAATTTACTATCTGCCAAGGAGTAGAAAGAAAGGCTGGGGATATTTGGGTTGGCTTTGGKTTTGATTTTTT " 

GCTTGGTTGGTTGGTTTGKACTAAAACAGTATTATCTTTTGAATATCGTAGGGACATAARKWVWWWMMWKKTyWTCMAW 

YMRAKAKGSYVRRAWKGGGSTYTYTSKKRKSTMWAMVYKVSCMCCYSKKRVVAWTYyYWMMYVCMYKYTSSSTGRYKRN 

KTAATGAAGTT 
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Prosite Pattern Matches for 23553 
Prosite versions: Release 122 of Febmary 1995 

> PS00001/ PDOC00001/ASN_6LYCOSYLATION N-glycosylation site. 

Query: 64 NKTR 67 

Query: 111 NCSS 114 

Query: 131 NNTG 134 

Query: 148 NGSY 151 

Query: 170 NYTV 173 

Query: 197 NESI 200 

Query: 240 NASQ 243 

Query: 623 NDSI 626 

Query: 773 NNTY 776 

Query: 783 NETH 786 

>PS00005/ PDOC00005/PKC_PHOSPHO_SITE Protein kinase C phospliofylation site. 



Query: 24 


TVR 


26 


Query: 27 


SPR 


29 


Query: 66 


TRK 


68 


Query: 96 


TGK 


98 


Query. 206 


SKR 


208 


Query: 400 


TNK 


402 


Query: 425 


SSK 


427 


Query: 468 


SGK 


470 


Query: 484 


TVR 


486 


Query. 488 


STR 


490 


Query: 505 


SCR 


507 


Query: 516 


SQR 


518 


Query: 520 


SQR 


522 


Query: 530 


TPK 


532 


Query: 611 


TVR 


613 


Query: 615 


THK 


617 


Query: 635 


SAR 


637 



> PS00006/ PDOC00006/CK2_PHOSPHO_SITE Casein kinase II phosphorylation site. 

Query: 107 TNNE 110 

Query: 288 SVDO 291 

Query: 367 TILD 370 

Query: 376 TPPD 379 

Query: 452 TACE 455 

Query: 505 SCRE 508 FIG 8 A 

Query: 781 TVNE 784 '- 
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>PS00007/PDOC00007rTYR_PHOSPHO_SITE Tyrosine kinase phosphoiylation site. 
Query: 637 RAWKDHKAY 645 
> PS00008/ PDOC00008/MYRISTYL N-myristoylation site. 

Query. 19 GSLCST 24 
Query: 161 GLIKNS 166 
Query: 325 GLVKGK 330 
Query: 592 GGNRGR 597 
Query: 763 GSFCAC 768 
Query: 851 GNKDGG 856 

> PS00523/ PDOC00117/SULFATASE_1 Sulfatases signature 1. 

Query: 85 PMCCPSRSSMLTG 97 FIG. 8B, 
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m I Lung T 



Lung T 



i: 

i: Lung N 
Breast T 
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Inpu-t file Fbh25278FLl. seqj Output File" 25278; trans FIG. 10A. 

Sequence length 2940 

CCACGCGTCCGCCCACGCGTCCGGCTGCCACGCCGCGTCTCAGGCTGGCCGGGCTGAGCCGG6GAAGAGGGAGCAAAGG ; 
CGGCGCAGGGCCTGCGCTTAGGCAGCGGGAGGCAGCTCGGCGCGGGCCTGACCTCCCCAGAGCGCCCCGCTGCGGCCGA 
GCAGATCCGGCCCAGCCGTCCGGCAGCCAGTCCCGGACCAGACACTGGACCGTCCCCGGGGGCGCTGAACTCCTCGC 
AGCATCCGAGCCGGCGGGCCGGTG6TGCGCCCTGGGCGCGCGAGGTGGTGAGGCCCCAGGAGCCCGGCGCGCCGGGACA 







M H 1 


LTGFSLVSLLSF 


15 


CGCGGGCCGGCTTGGCG ATG CAC ACC CTC ACT GGC TTC TCT CTG GTC AGC CTG CTC AGC TTC 


45 


G Y L 


S 


V 


D 


y 


A 


K 


PSFYADGPGEA 


35 

\J\J 


GGC TAC CTG TCC TGG GAC TGG GCC AAG CCG AGC TTC GTG GCC GAC GGG CCC GGG GAG GCT 


105 


G E Q 


P 


S 


A 


A 


P 


P 


Q P P H I I F I L T D 


\J\J 


GGC GAG CAG CCC TCG GCC GCT CCG CCC CAG CCT CCC CAC ATC ATC TTC ATC CTC ACG GAC 


165 


D Q G 


Y 


H 


D 


V 


G 


Y 


HGSDIETPT LD 


/ J 


GAC CAA GGC TAC CAC GAC GTG GGC TAC CAT GGT TCA GAT ATC GAG ACC CCT ACG CTG GAC 


225 


R L A 


A 


K 


G 


V 


K 


L 


E N Y Y I Q P I C T P 




AGG CTG GCG GCC AAG GGG GTC AAG TTG GAG AAT TAT TAC ATC CAG CCC ATC TGC ACG CCT 


285 


S R S 


Q 


L 


L 


T 


G 


R 


Y Q I H T G L Q H S I 


115 


TCG CGG AGC CAG CTC CTC ACT GGC AGG TAC CAG ATC CAC ACA GGA CTC CAG CAT TCC ATC 


345 


I R P 


Q 


Q 


P 


N 


C 


L 


P L D Q V T L P Q K L 


135 


ATC CGC CCA CAG CAG CCC AAC TGC CTG CCC CTG GAC CAG GTG ACA CTG CCA CAG AAG CTG 


405 


Q E A 


G 


Y 


S 


T 


H 


M 


VGKyHLGFYRK 


155 


CAG GAG GCA GGT TAT TCC ACC CAT ATG GTG GGC AAG TGG CAC CTG GGC TTC TAC CGG AAG 


465 


E C' L 


P 


T 


R 


R 


G 


F 


DTFLGSLTGNV 


1/ J 


GAG TGT CTG 


CCC 


ACC 


CGT 


CGG 


GGC 


nc 


GAC ACC TTC CTG GGC TCG CTC ACG GGC AAT GTG 


525 


D Y Y 


T 


Y 


D 


N 


C 


D 


GPGVCGFDL HE 


19j 


GAC TAT TAC 


ACC 


TAT 


GAC 


AAC 


TGT 


GAT 


GGC CCA GGC GTG TGC GGC TTC GAC CTG CAC GAG 




GEN 


V 


A 


V 


G 


L 


S 


GQYSTMLYAQR 


215 


GGT GAG AAT 


GTG 


GCC 


TGG 


GGG 


CTC 


AGC 


GGC CAG TAC TCC ACT ATG CTT TAC GCC CAG CGC 


645 


A S H 


I 


L 


A 


S 


H 


S 


PQRPLFLYVAF 


235 


GCC AGC CAT 


ATC 


CTG 


GCC 


AGC 


CAC 


AGC 


CCT CAG CGT CCC CTC TTC CTC TAT GTG GCC TTC 


705 


Q A V 


H 


T 


P 


L 


Q 


S 


P R E Y L Y R Y R T M 


255 


CAG GCA GTA 


CAC 


ACA 


CCC 


CTG 


CAG 


TCC 


CCT CGT GAG TAC CTG TAC CGC TAC CGC ACC ATG 


765 


G N V 


A 


R 


R 


K 


Y 


A 


A M V T C M D E A V R 


275 


GGC AAT GTG 


GCC 


CGG 


CGG 


AAG 


TAC 


GCG 


GCC ATG GTG ACC TGC ATG GAT GAG GCT GTG CGC 


825 


N I T 


y 


A 


L 


K 


R 


Y 


G F Y N N S V I I F S 


295 


AAC ATC ACC 


TGG 


GCC 


CTC 


AAG 


CGC 


TAC 


GGT TTC TAC AAC AAC AGT GTC ATC ATC TTC TCC 


885 


S D N 


G 


G 


Q 


T 


F 


S 


GGSNVPLRGRK 


315 


AGT GAC AAT GGT GGC CAG ACT 


TTC 


TCG 


GGG GGC AGC AAC TGG CCG CTC CGA GGA CGC AAG 


945 


G T Y 


W 


E 


G 


G 


V 


R 


G L G F V H S P L L K 


335 


GGC ACT TAT TGG GAA GGT GGC GTG CGG GGC CTA GGC TTT GTC CAC AGT CCC CTG CTC AAG 


1005 



# 
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RKQRTSRALMHITDWYPTLV 355 

CCA AAG CAA CGG ACA AGC CGG GCA CTG ATG CAC ATC ACT GAC TGG TAC CCG ACC CTG GTG 1065 

GLAGGTTSAADGLDGYDVWP 375 

GGT CTG GCA GGT GGT ACC ACC TCA GCA GCC GAT GGG CIA GAT GGC TAC GAC GTG TGG CCG 1125 

AISEGRASPRTEI L H N I D P L 395 

GCC ATC AGC GAG GGC CGG GCC TCA CCA CGC ACG GAG ATC CTG CAC AAC ATT GAC CCA CTC 1185 

YNHAQHGSLEGGFGIWNTAV 415 

TAC AAC CAT GCC CAG CAT GGC TCC CTG GAG GGC GGC TTT GGC ATC TGG AAC ACC GCC GTG 1245 

Q A A I R V G E W K L L T G D P G Y G D 435 

CAG GCT GCC ATC CGC GTG GGT GAG TGG AAG CTG CTG ACA GGA GAC CCC GGC TAT GGC GAT 1305 

WIPP.QTLAT FP GSWWNLERM 455 

TGG ATC CCA CCG CAG ACA CTG GCC ACC TTC CCG GGT AGC TGG TGG AAC CTG GAA CGA ATG 1365 

ASVRQAVWLFNISADP YERE 475 

GCC AGT GTC CGC CAG GCC GTG TGG CTC TTC AAC ATC AGT GCT GAC CCT TAT GAA CGG GAG 1425 

DLAGQRPDVVRTLLARLAEY 495 

GAC CTG GCT GGC CAG CGG CCT GAT GTG GTC CGC ACC CTG CTG GCT CGC CTG GCC GAA TAT 1485 

NRTAIPVRYPAENPRAHPDF 515 

AAC CGC ACA GCC ATC CCG GTA CGC TAC CCA GCT GAG AAC CCC CGG GCT CAT CCT GAC TTT 1545 

N G G A W G P y ASDEEEEEEEGR 535 

AAT GGG GGT GCT TGG GGG CCC TGG GCC AGT GAT GAG GAA GAG GAG GAA GAG GAA GGG AGG 1605 

ARSFS RGRRKKKCKICKLRS 555 

GCT CGA AGC TTC TCC CGG GGT CGT CGC AAG AAA AAA TGC AAG ATT TGC AAG CH CGA TCC 1665 

FF RKLNTRLMSQRI* 570 

TTT TTC CGT AAA CTC AAC ACC AGG CTA ATG TCC CAA CGG ATC TGA 1710 

TGGTGGGGAGGGAGAAAACTGTCCHTAGAGGATCTTCCCCACTCCGGCTTGGCCCTGCTGTTTCTCAGGGAGAAGCCT 
GTCACATCTCCATCTACAGGGAGTTGGAGGGTGTAGAGTCCCHGGTTGAACAGGGTAGGGAGCCTGGATAGGAGTGGG 
TGGGAATAAACCAGACTGGGATGCCTGTGTCTCAGTCCTGCCTCCTCACGGACHGCTCTGTGACCTCAGGTGACCCAC 
ATGAGCTTTTAGCCTCAGTTTCCTCATCTGTAAAATGAGCTCTAATGACTTTGTGACTCTTTGGTGTGGCCCTGGAGCC 
TGGGGCCACGGTGGAGTTCCTGGCCGGCCTTGCCACTTGACAACTCCTTTAAGGCTTCCCCCTTAACACGGGATCCCTG 
TGGTGGTGTTTGGGAGTTGCCTGGAGGCAACTCCAAGCCTGGCCCCCAGCTGAAGCATGGCAATCTGGCTGCTCTCTAC 
AGGGACCCCCAAGCGCTGTGGGTGGAGGGCAGGGGTCGGGGGGGTTGACCTTCTTGGGTCTTCACATGGCCTAGGCCAG 
TCCTCCGGTCAGACTGGTGTCAGGCACCGTGGTGCAAAATTCCTCTTCTGGCCCCTCCAGTACCCAGAGAAACTGGCTG 
GGCCATTAACTGCTGCAGCACCAAGGGTGGTAGAAAGAGCTGTGAAGAGCCCCCAAACCAGTACCAGGACACCTGGGTT 
CTCCTGTGACCTGGGGCACAGTTCTTGCCCTCTAGGCCTTGATTTCCCCACCTGCAAGTGGGGATGCCAGCCCTGGCTC 
TGCCTCCTTCATGAGGCTCTGGAAGACTGGCCAAGGTT6TGGAGGAGCTTGTGAACTTGATTAAAGTGTCGTAACATGG 
AAAAAAAAAAAAAAAAAAAAAAGGGCGG 



FIG. 10B. 
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FIG. 13. 

Prosite Pattern Matches for 25278 
Prosile versions: Release 112 of Febnjaiy 1995 
> PS00001/ PIX)C00001/ASN GLYCOSYlAHONN^IycosylaliQnsile. 
Query. 276 NITW 279 
Query: 288 NNSV 291 
Query: 466 NISA 469 
Query: 496 NRTA 499 

>PS()0()04/ PI)OaX)()04/CAMP PHOSPHO SITE cAMP- and cGMP-dependent protein kinase phosphorylalion site. 
Query: 314 RKGT 317 

>PS()()()()5/ PIX)a)()005/PKC_PHOSPHO_SITE Protein kinase C phosphoiylalion site. 

Query: 102 T6R 104 

Query 160 TRR 162 

Query 244 SPR 246 

Query 340 TSR 342 

Query 383 SPR 385 

Query 457 SVR 459 

Query 566 SQR 568 

>PS00006/ PDOC00006/CK2_PHOSPHO_SITE Casein kinase II phosphoiylation site. 

Query 67 SDIE 70 

Query 244 SPRE 247 

Query 268 TCMD 271 

Query 317 TYWE 320 

Query 363 SAAO 366 

Query 525 SDEE 528 

>^OPPO(XX)0()07/TYR_PHOSPHO_SI1E 

Query 134 KLQEAGY 140 

>PS0000(y PIX)a)0008/MYRISmN4ny^^ 

Query 110 aOHS1 115 

Query 169 GSLTGN 174 

Query 205 GQYSTM 210 

Query 300 GQTFSG 305 

Query 321 GGVRGL 326 

Query 356 GLA6GT361 

Query 402 GSLEG6 407 

Query 409 GIWNTA414 

Query 447 GSWWNL 452 

>PS00009/ PDOC00009/AMIDATlON AmidalkM) site. 

Query 312 RGRK 315 
Query 541 RGRR 544 

>PS00149/ PDOC001 17/SULFATASE_2 Sullateses signature 1 
Query 139 GYSTWMVGKW 148 
> PS0052y PDOC00117/SULFATA$E 1 Sulfatases signature 1. 
Query 91 PICTPSRSQliTG 103 
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Colon T 
Colon T 
Colon T 
GSi(Colon T 
Colon N 
iS Colon N 
EB Colon N 
Colon N 



flung T 
LungT 
ELungT 
I Lung T 
LungT 
ILungT 
HLung T 
BLung N 
Lung N 
S Lung N 
tiiaissiiai!!!!^ Lung N 

Breast T 
Breast T 
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Input file 26212consi Output File 26212pat 
Sequence length 2266 



aVCaXrrCCGCCCAOGOGTCOGTCWAGATATTAACT^^ 

aMGAGGAGGA<WAGAAA<rt<SAAATGTXX:TGGAGAAGAGO^^ 

AT<avCTTCTGGAAGATTAAAGTTGTCGGACAT<XnX3AaVGC^ 



TCA< 



.COGTCTGTTGGGTGOVTGTGTXKXXICCGCASCGGCGCGGGGC^^ 



TGAGTGA 


H 
ATG 


A 
GCT 


P 
CCC 


R 
AGG 


G 
GGC 


C 
TGT 


A 
GCG 


G 
GGG 


H 
CAT 


P P P 
r^c**^ f*f^f2 


P 
CCT 


S 
TCT 


P 
CCA 


Q 
CAG 


A 
GCC 


C 
TGT 


18 
54 


V 
GTC 


C 
TGT 


P 
CCT 


G 
GGA 


K 
AAG 


M 
ATG 


L 


A 


M 
A1X3 


G 
GGG 


A 
GCG 


LAG 
CTG GCA GGA 


F 
TTC 


W 
TGG 


I 
ATC 


L 
CTC 


C 
TGC 


L 

CTC 


38 
114 


L 
CTC 


T 
ACT 


Y 
TAT 


G 
GGT 


Y 
TAC 


CTG 


S 
TCC 


w 

TGG 


G 
GGC 


Q 
CAG 


A 
GCC 


LEE 
TTA GAA GAG 


E 
GAG 


E 
GAA 


E 
GAA 


G 
GGG 


A 
GCC 


L 

TTA 


58 
174 


CTA 


A 
GCT 


Q 

CAA 


A 
GCT 


G 
GGA 


E 
GAG 


K 
AAA 


CTA 


E 
GAG 


P 
CCC 


S 
AGC 


T T S 
ACA ACT TCC 


T 
ACC 


S 
TCC 


Q 
CAG 


P 
CCC 


H 
CAT 


L 

CTC 


78 
234 


I 
ATT 


F 
TTC 


I 
ATC 


li 
CTA 


A 
GOG 


D 
GAT 


D 
GAT 


Q 
CAG 


G 
GGA 


F 
TTT 


R 
AGA 


D V G 
GAT GTG GGT 


Y 
TAC 


H 
CAC 


G 
GGA 


S 
TCT 


E 
GAG 


I 
ATT 


98 
294 


K 
AAA 


T 
ACA 


P 
CCT 


T 
ACT 


Ii 

CTT 


D 

GAC 


K 
AAG 


L 
CTC 


A 
GCT 


A 
GCC 


E 
GAA 


G V K 
GGA GTT AAA 


L 
CTG 


E 
GAG 


N 
AAC 


Y 
TAC 


Y 
TAT 


y 

GTC 


118 
354 


Q 
CAG 


P 
CCT 


I 


C 


T 


P 


S 


R 


S 


Q 


F 


I T G 


K 


Y 


Q 


I 


H 


T 


138 
414 


ATT 


TGC 


ACA 


CCA 


TCC 


AGG 


AGT 


CAG 


TTT 


ATT ACT GGA 


AAG 


TAT 


CAG 


ATA 


CAC 


ACC 


G 




Q 


H 


S 


I 


I 


R 


P 


T 
ACC 


Q 
CAA 


PNC 
CCC AAC TGT 


L 
TTA 


P 
CCT 


L 
CTG 


D 
GAC 


N 
AAT 


A 

GCC 


158 
474 


G6A 


CTT 


CAA 


CAT 


TCT 


ATC 


ATA 


AGA 


CCT 






P 


Q 


K 


L 


K 


E 


V 


G 


Y 


S T H 


M 


V 


G 


K 


W 


H 


178 


T 
ACC 


CTA 


CCT 


CAG 


AAA 


CTG 


AAG 


GAG 


GTT 


GGA 


TAT 


TCA ACG CAT ATG GTC GGA AAA a%3G CAC 


534 




G 


F 


Y 


R 


K 


E 


C 


M 


P 


T 


R R G 


F 


D 


T 


F 
TTT 


F 
TTT 


G 
GGT 


198 
594 


TTG 


GGT 


TTT 


TAC 


AGA 


AAA 


G7^ 


TGC 


ATG CCC ACC AUA AliA i^>i 


"TTT 


"GST 






S 
TCC 


L 
CTT 


L 
TTG 


G 
GGA 


S 
AGT 


G 
GGG 


D 
GAT 


Y 
TAC 


Y 
TAT 


T 
ACA 


H 
CAC 


Y K C 
TAC AAA TGT 


D 
GAC 


s 

AGT 


p 

CCT 


G 
GGG 


M 
ATG 


C 
TGT 


218 
654 


G 
GGC 


Y 
TAT 


D 
GAC 


TTG 


Y 
TAT 


E 
GAA 


N 
AAC 


D 
GAC 


N 
AAT 


A 
GCT 


A 
GCC 


W D Y 
TGG GAC TAT 


D 
GAC 


N 
AAT 


G 
GGC 


I 
ATA 


Y 
TAC 


S 
TCC 


238 
714 


T 
ACA 


Q 
CAG 


H 
ATG 


Y 
TAC 


T 
ACT 


Q 

CAG 


R 
AGA 


V 
GTA 


Q 
CAG 


Q 
CAA 


I 
ATC 


L A S 
TTA GCT TCC 


H 
CAT 


M 
AAC 


P 
CCC 


T 
ACA 


K 
AAG 


P 
CCT 


258 
774 


I 
ATA 


F 
TTT 


L 
TTA 


Y 
TAT 


I 
ATT 


A 
GCC 


Y 
TAT 


Q 
CAA 


A 
GCT 


V 
GTT 


H 
CAT 


S P L 
TCA CCA CTG 


Q 
CAA 


A 
GCT 


p 

CCT 


G 
GGC 


R 
AGG 


Y 
TAT 


278 
834 



J IlijiNRRRYAAMLS 298 

T?C gL C^C tIc cSa T^C ATT ATC AAC ATA AAC AGG AGG AGA TAT GCT GCC ATG CTT TCC 894 

/.rnuAlNNVTLALKTYGFYN 318 

tSc TTA GAT gIa G^ ATC AAC AAC GTC ACA TTG GCT CTA AAG ACT TAT GGT TTC TAT AAC 954 



FIG. 15A. 
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N 


S 


I 


I 


I 


Y 


S 


S 


D 


N 


G 


G 


Q 


P 


T 


A 


G 


G 


s 


N 


338 


AAC AGC 


ATT 


ATC 


ATT 


TAC 


TCT 


TCA 


GAT 


AAT 


GGT 


GGC 




CCT 


ACG 


GCA 


GGA 


GGG 


AGT 


AAC 


1014 


W 


P 




R 


G 


S 


K 


G 


T 


Y 


W 


E 


G 


G 


X 


R 


A 


V 


G 


F 


358 


TGG cxrr « 


CTC 


AGA < 


GGT 


AGC 


AAA 


GGA 


ACA 


TAT 


TGG 


GAA 


GGA 


GGG 


ATC 


CGG 


GCT 


GTA 


GGC 


TTT 


1074 


V 


H 


S 


P 


V 


L 


K 


N 


K 


G 


T 


V 


C 


K 


E 


L 


V 


H 
CAC 


I 
ATC 


1' 
ACT 


378 
1134 


GTG 


CAT 


AGC 


CCA 


CTT 


CTG 


AAA 


AAC 


AAG 


GGA 


ACA GTG 


TGT 


AAG 


GAA 


CTT 


GTG 


D 
GAC 


W 

TGG 


Y 
TAC 


P 
CCC 


T 
ACT 


CTC 


I 
ATT 


S 
TCA 


Xj 
CTG 


A 

GCT 


E 
GAA 


G 
GGA 


Q 
CAG 


I 
ATT 


D 
GAT 


E 
GAG 


D 
GAC 


I 
ATT 


.Q 
CAA 


CTA 


398 
1194 


D 


G 


Y 


D 


I 


W 


E 


T 


I 


S 


E 


G 


L 


R 


S 


P 


R 


V 


D 


I 


418 


GAT 


GGC 


TAT 


GAT 


ATC 


TGG 


GAG 


ACC 


ATA 


AGT 


GAG 


GGT 


CTT 


CGC 


TCA 


CCC 


CGA 


GTA 


GAT 


ATT 


1254 




H 


N 


I 


D 


P 


I 


Y 


T 


K 


A 


K 


K 
AAT 


G 
GGC 


S 
TCC 


w 


A 


A 


G 


Y 


438 
1314 




CAT 


AAC 


ATT 


GAC 


CCC 


ATA 


TAC 


ACC 


AAG 


GCA 


AAA 


TGG 


GCA 


GCA 


GGC 


TAT 


G 


I 


W 

TGG 


N 
AAC 


T 


A 


I 


Q 


S 


A 


I 


R 


V 


Q 


H 


W 


K 


L 


L 


T 


458 


GGG 


ATC 


ACT 


GCA 


ATC 


CAG 


TCA 


GCC 


ATC 


AGA 
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GGACCATGGTATAGAGAGGAAACCAAGAAAAAGAAGCCAAGCAAAAATCAGGCTGAGAAAAAGCAAAAGAAAAGCAAA^ 



AAAGAAGAAGAAACAGCAGAAAGCAGTCTCAGGTTC7VACTTGCCATTCAGGTGTTACTTGTGGATAAG 

CCTGTTTGGTTAAACTTTAATCAGTTCTTATCTTTCATCTGTTTCCT 

GCTGGCCTAAGCGTCAGGCTTGTTTTCATGCTGTGCCACCTGGTGCCGAAT^ 



FIG. 15B. 
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Title: 22438, 23553. 25278. and 26212. 

Novel Human Sulfatases 
Inventor(s): Glucksmann et al. 
Application No: 09/495.823 
Atty Dkt No: 5800-79(35800/191890) 
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Prosite Pattern Matches for 26212prot 

Prostte vereloa: Release 12.2 of February 1995 

>PSOOOOI I PDOCOOOOl I ASN_GI.YCOSYtATION N-glycosylation site. 
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> PS00004 |PDOC00004 |CAMP_PHOSPHO_SITE cAMP- and cGMP -dependent 
protein kinase phosphorylation site. 
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Query: 521 

>PS0POpg I PDOC00005 I PKC_PHOSPHO_SITE Protein kinase C 



phosphorylation site. 
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FIG. 18A. 
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Application No: 09/495,823 
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>PS00006 1 PDQC00006 (CK2 PHOSPHO_SITE Casein kinase II phosphozvIat:ion site. 
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>PS00007 ^ PDOC00007 jTYIL_PHOSPHO„SITE Tyrosine kinase phosphorylation site- 
Query: 163 KLKEVGY 169 

>PS00008 \ PDOC00008 \ MYRISTYL N-inyristoylation site. 
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> PS00149 | PDQC00117 I SUL.FATASE_ 2 Sulfatases signature 2, 
Query: 168 GYSTHMVGKW 177 

> PS00523 I PDQC00117 I SUIiFATASE_l Sulfatases signature 1, 
Query: 120 PICTPSRSQFITG 132 



FIG. 18B. 



